A Pipeline for Post-Crisis Twitter Data Acquisition

نویسندگان

  • Mayank Kejriwal
  • Yao Gu
چکیده

Due to instant availability of data on social media platforms like Twitter, and advances in machine learning and data management technology, real-time crisis informatics has emerged as a prolific research area in the last decade. Although several benchmarks are now available, especially on portals like CrisisLex, an important, practical problem that has not been addressed thus far is the rapid acquisition and benchmarking of data from free, publicly available streams like the Twitter API. In this paper, we present ongoing work on a pipeline for facilitating immediate post-crisis data collection, curation and relevance filtering from the Twitter API. The pipeline is minimally supervised, alleviating the need for feature engineering by including a judicious mix of data preprocessing and fast text embeddings, along with an active learning framework. We illustrate the utility of the pipeline by describing a recent case study wherein it was used to collect and analyze millions of tweets in the immediate aftermath of the Las Vegas shootings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages

Microblogging platforms such as Twitter provide active communication channels during mass convergence and emergency events such as earthquakes, typhoons. During the sudden onset of a crisis situation, affected people post useful information on Twitter that can be used for situational awareness and other humanitarian disaster response efforts, if processed timely and effectively. Processing soci...

متن کامل

Twitter as an instrument for crisis response: The Typhoon Haiyan case study

The research presented in this paper attempts an initial evaluation of Twitter as an instrument for emergency response in the context of a recent crisis event. The case of the 2013 disaster, when typhoon Haiyan hit Philippines is examined by analyzing nine consecutive days of Twitter messages and comparing them to the actual events. The results indicate that during disasters, Twitter users tend...

متن کامل

Twitter Explodes with Activity in Mumbai Blasts! A Lifeline or an Unmonitored Daemon in the Lurking?

Online social media has become an integral part of every Internet users’ life. It has given common people a platform and forum to share information, post their opinions and promote campaigns. The threat of exploitation of social media like Facebook, Twitter, etc. by malicious entities, becomes crucial during a crisis situation, like bomb blasts or natural calamities such as earthquakes and floo...

متن کامل

Crisis Communication on Twitter during a Global Crisis of Volkswagen – The Case of “Dieselgate”

In this study, we investigate the communication behaviour in Twitter during the rise of a corporate crisis. In September 2015, the emission scandal of Volkswagen (also known as “Dieselgate”) became public. We collected Twitter data and analysed approximately 400,000 tweets regarding the Volkswagen crisis. We take different perspectives on the data, by 1) separating the overall communication in ...

متن کامل

TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

Twitter is the largest source of microblog text, responsible for gigabytes of human discourse every day. Processing microblog text is difficult: the genre is noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets and other microblog text. We present TwitIE, an open-source NLP pipeline customised to microblog text at every...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.05881  شماره 

صفحات  -

تاریخ انتشار 2018